In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction.
Osprey is a mask-text instruction tuning approach that extends MLLMs by incorporating pixel-wise mask regions into language instructions, enabling fine-grained ...
Experimen-tal results demonstrate Osprey's superiority in various region understanding tasks, showcasing its new capability for pixel-level instruction tuning.